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REMARKS 

Entry of this response is proper under 37 CFR §1.116, since no new claims or issues 
are presented. 

Claims 1-20 are all the claims presently pending in the application. 

It is noted that the claim amendments, if any, are made only for more particularly 
pointing out the invention, and not for distinguishing the invention over the prior art, 
narrowing the claims or for any statutory requirements of patentability. Further, Applicant 
specifically states that no amendment to any claim herein should be construed as a disclaimer 
of any interest in or right to an equivalent of any element or feature of the amended claim. 

The Examiner's earlier objection to claims 4, 10, and 16 are understood as having 
been withdrawn, since this objection is not repeated in the rejection currently of record. 

Claims 1-20 stand rejected under 35 U.S.C. § 101 as allegedly directed to non- 
statutory subject matter. Claims 1, 6, 7, 12, 13, and 18 stand rejected under 35 U.S.C. § 
102(b) as allegedly anticipated by co-inventor Gustavson's own prior publication 
("Superscalar GEMM-based Level 3 BLAS - The On-going Evolution of a Portable and 
High-Performance Library"). Claims 3, 4, 9, 10, 15, and 16 stand rejected under 35 U.S.C. § 
103(a) as allegedly unpatentable over Gustavson, further in view of US Patent 6,357,041 to 
Pingali et al. Claims 19 and 20 stand rejected under 35 U.S.C. § 103(a) as allegedly 
unpatentable over Gustavson, further in view of "PLAPACK: Parallel Linear Algebra 
Package Design Overview" by Philip (Alpatov) et al. 

Claims 1, 5-7, 11-13, 17, and 18 stand rejected under nonstatutory obviousness-type 
double patenting over claims 21 and 22 of co-pending application 10/671,934. Claims 3, 4, 
9, 10, 15, and 16 stand rejected under nonstatutory obviousness-type double patenting over 
claims 21 and 22 of co-pending application 10/671,934, further in view of Pingali. 

These rejections are respectfully traversed in the following discussion. 

I. THE CLAIMED INVENTION 

As described, for example, by independent claim 1, the claimed invention is directed 
to a method of improving at least one of speed and efficiency when executing a level 3 dense 
linear algebra subroutine on a computer, including automatically setting an optimal machine 
state on the computer for the processing by selecting an optimal matrix subroutine from 

7 



Serial No. 10/671,935 

Docket No. YOR920030330US1 (YOR.485) 



among a plurality of matrix subroutines stored in a memory that could alternatively perform a 
level 3 matrix multiplication . 

As explained at lines 7-1 1 of page 4 of the specification, the conventional wisdom for 
linear algebra processing considers that only one kernel type is available for matrix 
multiplication. However, as explained at lines 1 1-14 of page 5, such limitation of having a 
single kernel available for matrix multiplication forces data copying that limits efficiency of 
the multiplication processing. 

The claimed invention, on the other hand, provides a method to reduce and/or 
eliminate such data copying by allowing a selection of an optimal kernel for the processing, 
as selected based on which matrix would most optimally reside in LI cache . 

II. THE NON-STATUTORY SUBJECT MATTER REJECTIONS 

Claims 1-20 stand rejected under 35 U.S.C. §101 for allegedly failing to address 
statutory subject matter. 

In paragraph 13. a. beginning on page 10 of the Office Action, the Examiner argues 
that ". . . the claims do not explicitly disclose a practical application of the optimal subroutine 
to perform matrix multiplication. Basically, the claims just disclose a method of selecting a 
subroutine from a set of subroutine [s] to perform a matrix multiplication. The improvement 
of speed/efficiency would not constitute as concrete, useful, and tangible as required under 
35 U.S.C. 101 r 

In response, Applicants respectfully disagree and submit that the placing of the 
machine into an optimal state by selecting one of possible alternative kernels to perform a 
given processing inherently provides the advantage over conventional methods (wherein only 
one kernel is used) of increasing speed and efficiency. Applicants further submit that 
increasing processing speed and efficiency is exactly the type of results one would desire 
from a patent and this result is even expressly mentioned in the independent claims. 

Therefore, Applicants simply disagree with the Examiner's position that the present 
invention fails to satisfy the "useful, concrete and tangible result" standard of review for 
statutory subject matter, if this test is consider appropriate to apply to the method claims. 

However, it also noted that the claims include apparatus and Beauregard-type claims, 
and these claims are clearly addressed to statutory subject matter, even if the method claims 
were to be deemed directed to non-statutory subject matter. 

In paragraph 13. b. beginning on page 11 of the Office Action, the Examiner argues 
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"... that the claim language recites a machine-readable storage medium tangibly embodying 
the program but not as a tangible machine readable storage medium embodying the program 
as alleged by the applicant. Further, the specification page 24 lines 10-15 does suggest that 
the machine readable medium can be non-tangible medium such as digital and analog and 
communication links and wireless. Clearly, the machine readable storage medium claims are 
directed to non-tangible medium." 

In response, Applicants respectfully submit that it is the Examiner who summarily 
declares that the description at lines 10-15 of page 24 is both incorporated into the claim 
language and is non-statutory. Applicants respectfully disagree. 

First, it is brought to the Examiner's attention that the claim language itself limits the 
claim to "a machine-readable storage medium tangibly embodying a program of machine- 
readable instructions executable by a digital processing apparatus . ..." As such, this language 
clearly defines a "process" and is statutory by reason of being one of the four categories 
specifically itemized in 35 USC §101. 

Second, the wording "machine-readable storage medium" clearly includes ROM and 
RAM containing the machine-readable instruction, as well as the standalone disks or 
diskettes of the Beauregard-type claims. Therefore, Applicants submit that, since the 
language clearly covers at least some statutory subject matter , the claimed invention is 
statutory. That is, the evaluation for statutory subject matter is the invention as a whole , not 
whether an Examiner is able to interpret the language as possibly including definitions for 
which case law arguably provides no clear answer. 

Moreover, to the extent that the Examiner considers that this language includes the 
description on page 24 of the specification making reference to transmission media, 
Applicants respectfully point out that there is no case law in this regard whether a series of 
machine-readable instructions that define a process to be executable by a machine is excluded 
from being statutory subject matter, particularly considering that, as mentioned above, a 
"process" is expressly identified as one of the four categories listed in 35 USC §101. To the 
extent that the Examiner considers that a transmission can "tangibly embody a program of 
machine readable instructions executable by a digital processing apparatus to perform a 
method of . . Applicants again point out that this transmission clearly defines a "process." 
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III. THE PRIOR ART REJECTIONS 

The Examiner alleges that co-inventor Gustavson's prior publication "Superscalar 
GEMM-based Level 3 BLAS - The On-going Evolution of a Portable and High- Performance 
Library" teaches the claimed invention defined by claims 1, 6, 7, 12, 13, and 18, and, when 
combined with the teachings of Pingali, renders obvious claims 3, 4, 9, 10, 15, and 16, and 
when combined with the teachings of Philip, renders obvious claims 19 and 20. 

Applicants submit, however, that co-inventor Gustavson, as one of the authors of the 
cited primary reference, declares unequivocally that this publication described ways to write 
other level 3 BLAS in terms of DGEMM and featured only a single kernel and the use of data 
copying. 

In contrast, the present invention describes the potential use of any of six kernel 
routines (one of which can be selected as optimal, particularly in view of one or more of 
others of the techniques described in the remaining six co-pending applications) and newer 
forms of data copying called "register blocking" (see co-pending Application S/N 
10/671,888, corresponding to Attorney Docket YOR920030169US1). There is no 
suggestion in the Gustavson publication of using a selected one of six possible kernels , and 
the Examiner fails to point out specific locations reasonably demonstrating such plurality of 
selectable kernel subroutines. 

That is, the Examiner points to section 3 in pages 208-209 and section 3.2 in pages 
210-211 and the first four lines under the introduction section on page 207. 

In response, Applicants respectfully submit that none of these locations even suggest 
the availability of alternative kernels, let alone selecting an optimal kernel from among six 
possible kernels. 

That is, section 3 on pages 208-209 state: 
3 Superscalar GEMM-based level 3 BLAS 

To approach peak performance on state-of-the-art superscalar microprocessors it is 
necessary to attain extensive register reuse. In general, multiple calls to the level 1 and 
level 2 BLAS routines prohibit an efficient register reuse. 

Recently, Kdgstrom and Ling announced the first version of the superscalar GEMM- 
based level 3 BLAS. They have also developed a superscalar DGEMM that currently is 
used with the library. The superscalar library has essentially the same overall 
structure, with similar blocking, as the regular GEMM-based level 3 BLAS. The main 
difference in the design is that all calls to underlying level 1 and level 2 BLAS have been 
removed. As before, the dominating part of all floating point operations take place in calls 
to DGEMM. The remaining computations that take care of triangular diagonal blocks are 
handled by "in-line " code optimized for efficient register reuse. 
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Section 3.2 on pages 210-211 state: 
3.2 Improved performance for the superscalar library 

In the current release of the superscalar GEMM-based level 3 BIAS, 4 x 4 unrolling is 
used for the C matrix in DGEMM and 4x2 unrolling is used in the remaining rou tines. As 
for the GEMM-based model implementations all references are stride one which is 
implemented using work arrays and data copying prearranged so that the DGEMM kernel 
will run close to peak performance. The extra data copying allows the superscalar library 
to handle so called "critical" leading dimensions as well [9, 10]. The Fortran source code is 
publically available from netlib, see 'www.netlib.org/blas/gemm_based/ssgemmbased.tgz'. 

Performance results from the GEMM-based level 3 BIAS performance benchmark on 
an IBM PowerPC 604 processor (112 MHz, IBM SP, SMP node) show substantial 
improvements for the current release of the superscalar library: 

DSYMM DSYRK DSYR2K DTRMM DTRSM 
+ 3% +28% +2% +23% +25% 
These percentage numbers are for square matrices of size 500 x 500. We obtain up to 80% 
improvement for small matrices (32 x 32). The improvements are mainly for the routines 
that called level 2 routines in the model implementations [9, 10]. The GEMM-based 
algorithms for DSYMM and DSYR2K do not call any level 2 routines. The calculations 
are transformed to level 3 GEMM operations by copying the symmetric subblocks stored 
in triangular format to general full format subblocks in work arrays [11]. 

The ATLAS [ 12] and PHiPAC projects [3] use the superscalar GEMM-based level 3 
BLAS together with their own automatically tuned DGEMM to provide a complete set of 
level 3 BLAS in double precision. The ATLAS project reports impressive performance results 
for several different machines where the combination of the superscalar GEMM-based level 
3 BLAS and ATLAS DGEMM is often faster than the vendor supplied level 3 BIAS, see 
'www. netlib. org/atlas '. 

The first four lines in the introduction page 207 state: 
1 Introduction 

The level 3 Basic Linear Algebra Subprograms (BLAS) [4] are a de facto standard for 
various matrix multiply and triangular system solving computations and are successfully 
used as building blocks for the development of high-performance dense linear algebra library 
software. 



Nowhere in the above-recited passages is there even a suggestion of alternative 
kernels or a selection of an optimal kernel from among a plurality of kernels that could 
alternately be used, and the Examiner is respectfully requested to point out specific lines 
intended to support his position. Co-inventor and co-author Gustavson states emphatically 
that this paper had no suggestion whatsoever of such alternative kernel selection. 

Therefore, Applicants respectfully submit that the rejection currently of record fails to 
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establish a prima facie rejection for either anticipation or obviousness, since it has the 
fundamental flaw of failing to provide a key element of the independent claims. 

The Examiner relies upon secondary references Pingali and Philip for reasons 
unrelated to overcoming this basic deficiency of the primary reference, so that neither of 
these secondary references overcome the deficiency of the primary reference. 

Hence, turning to the clear language of the claims, in the Gustavson publication there 
is no teaching or suggestion of: ". . .. automatically setting an optimal machine state on said 
computer for said processing by selecting an optimal matrix subroutine from among a 
plurality of matrix subroutines stored in a memory that could alternatively perform a level 3 
matrix multiplication processing ", as required by independent claim 1. The remaining 
independent claims have similar language. 

Therefore, Applicant submits that there are elements of the claimed invention that are 
not taught or suggested by Gustavson' s prior publication, and the Examiner is respectfully 
requested to reconsider and withdraw this rejection. 

Relative to the rejections based on combining secondary references Pingali or Philip 
with the Gustavson publication, Applicants submit that neither secondary reference 
overcomes the deficiency of the primary reference, so that all claims are clearly patentable 
over this publication, even if combined with these two secondary references. 

IV. THE DOUBLE PATENTING REJECTIONS 

Claims 1, 5-7, 11-13, 17, and 18 stand rejected under nonstatutory obviousness-type 
double patenting over claims 21 and 22 of co-pending application S/N 10/671,934, and 
claims 3, 4, 9, 10, 15, and 16 stand rejected under nonstatutory obviousness-type double 
patenting over these claims 21 and 22 of co-pending application S/N 10/671,934, further in 
view of US Patent 6,357,041 to Pingali et al. 

In response, Applicants again respectfully submit that co-pending application S/N 
10/671,934 relates to a specific technique of streaming of data for level 3 matrix 
multiplication processing, not to the selection of an optimal subroutine for performing the 
processing. These procedures are clearly patentably distinct by reason of providing two 
distinctly different results, as evidenced by the different independent claims in the two 
applications. 

That is, claims 21 and 22 of co-pending application S/N 10/671,934 respectively 
depend off of an independent claim that requires a determination of which matrix will reside 
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in which cache layer . These two dependent claims 21 and 22 have to be interpreted as the 
combination of first determining which matrices reside on the various cache levels followed 
by the step of selecting two kernels from six possible kernels to perform a level three matrix 
multiplication processing. 

In contrast, the independent claims of the present application S/N 10/671,935 define 
the entirely different and unrelated process of determining which one of alternative kernel to 
use as the subroutine for the matrix processing . None of the rejected claims in the present 
application '935 addresses the determination of a second kernel for the processing, as 
required by the independent claims of co-pending application '934. 

Along this line, it is noted that the rejection currently of record fails to reasonably 
demonstrate any suggestion to determine even one optimal kernel, let alone two optimal 
kernels. Therefore, Applicants submit that it could hardly be considered obvious to select a 
second optimal kernel (as required by dependent claims 21 and 22 of '934), if it has not even 
been demonstrated to select a first optimal kernel. 

The double patenting rejections currently of record provide no objective evidence or 
rationale to support this conclusion of obviousness, contrary to the requirement of the recent 
US Supreme Court holding in KSR: "There must be some articulated reasoning with some 
rational underpinning to support the legal conclusion of obviousness', KSR Int'l v. Teleflex , 
Inc., 127 S. Ct. 1727, 1741, 82 USPQ2d 1385, 1396 (2007). The double patenting rejections 
consist of conclusory statements only ; there is no analysis or rationale in these rejections, as 
required by the KSR holding. 

Therefore, Applicants respectfully submit that these rejections fail to meet the initial 
burden of a prima facie obviousness rejection, and the Examiner is respectfully requested to 
reconsider and withdraw these rejections. 

V. FORMAL MATTERS AND CONCLUSION 

In view of the foregoing, Applicant submits that claims 1 -20, all the claims presently 
pending in the application, are patentably distinct over the prior art of record and are in 
condition for allowance. The Examiner is respectfully requested to pass the above 
application to issue at the earliest possible time. 

Should the Examiner find the application to be other than in condition for allowance, 
the Examiner is requested to contact the undersigned at the local telephone number listed 
below to discuss any other changes deemed necessary in a telephonic or personal interview . 
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The Commissioner is hereby authorized to charge any deficiency in fees or to credit 
any overpayment in fees to Assignee's Deposit Account No. 50-0510. 



Respectfully Submitted, 




Date: December 4, 2007 

Frederick E. Cooperrider 
Registration No. 36,769 

McGinn Intellectual Property Law Group, PLLC 

8321 Old Courthouse Road, Suite 200 
Vienna, VA 22182-3817 
(703) 761-4100 
Customer No. 21254 
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